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1.  We  consider  a  controlled  stochastic  linear  system 


Z  =  x  +  X  +  R  -  L 
s  s-t  s-t  s-t. 


(1) 


Here  X  is  a  (u ,a  )- Brovmian  motion  and  R  and  L  are  the  control  functionals, 
which  are  increasing  and  adapted  to  the  o- field  generated  by  the  process  X. 
For  the  policy  S  =  (L,R)  the  expected  cost  takes  the  form 

T  X 

Ks(x,t)  =  E{Jh(Z  ,s)e'Y(s't)ds  ♦  Ife  dLg_t 


T  -Y  (s-t) 
+  r  /  e 
t 


dRs-tK 


(2) 


Here  h,  t  and  r  stand  for  holding  cost  and  unit  cost  of  displacement  to  the 
left  and  to  the  right  respectively,  and  y  >  0  is  the  discount  factor.  Our 
objective  is  to  characterize  the  optimal  cost  (the  value  function) 


V*(x,t)  =  max  Kc(x,t)  (3) 

S  b 

and  to  describe  the  optimal  policy  S*  =  (L* ,R*)  foT  which  V*  =  Ks*.  The 
function  V  satisfies  the  Hamilton- Jacobi- Bellman  equation  (cf,  C 2 ]) 

0  =  min(3Vg^  ^  +  TV(x.t)  -  yV(x,t)  +  h(x,t)  , 

DV(x,t)  +  r,  - DV ( x , t )  +  l}, 

(4) 

0  =  V(x,T), 


where  D 


=  ’^2  3^  + 


(5) 


Our  main  technical  assumptions  are  similar  to  the  ones  in  [2].  We  assume  that  \ 
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h(x,t)  is  a  nonnegative  function  such  that  there  exists  constants  m  and 


0  i  c  s  C,  such  that  for  every  x,  x',  t,  t' 


c|x|m  -  C  <  h(x,t)  *  C  ( 1  + 1  x  |  , 


|h(x,t)  -  h(x',t)|  <  CClHxr-^M-1)  |x  -  x'|, 


|h(x,t)  -  h(x,t') |  s  C(l+| x|  )  |t  -  t'|. 


0  <  (x,t)  s  C(l+|x|q),  q  =  (m-2) + . 


Theorem  1 .  Under  the  assumption  (6),  (7)>  there  exists  a  unique  solution  V* 


to  the  equation  (4) .  This  solution  is  the  value  function  (3)  of  the  control 


problem  (1) ,  (2) . 


There  exists  an  optimal  policy  S*  =  (L*,R*)  for  which  V*  =  Kg*.  If 


x*(t)  =  min{x:  DV(x,t)  =  1} 


x*(t)  =  max{x:  DV(x,t)  =-r} 


then  for  Z*  given  by  (1) 


x*(x)  <;  Z*  <  x*(s)  , 


l  h-  <  -  0 


[  *Z*  >  x*(s)dRs-t  =  °- 


The  above  theorem  shows  that  the  optimal  control  consists  of  reflecting  of  the 


control  process  Z*  from  time- dependent  (a'  priori  unknown)  boundary. 


Let  V  =  {(x,t):  x*(t)  ^x  s  x|(t)}and  let  W  =  DV(x,t).  By  formally  differen¬ 


tiating  (4)  we  get 


$ 


% 


:S 


■il 


3W  ,  .  , 

—  x,t 

♦  rwi x, 

t)  - 

>W( x , t )  *  H( x,t I  =  0, 

3t 

l  10  i 

if  (x,t) 

‘  v. 

W(x,t)  < 

r ,  for 

all 

x  f  F,  0  i  t  *  T, 

i  11  l 

W( x  ,t)  > 

-f,  for 

all 

x  €  F,  0  S  t  <  T, 

(12) 

W(x,T)  = 

0. 

(131 

where  all  equalities  and  inequalities  are  understood  in  the  sense  of  generalized 
function . 

Assume  that  H(0,t)  =0,  i.e.  0  =  argmin  h(x,t).  Consider  the  following 

minnax  problem  (game  of  two  persons) 


T  A0AT 

W(x,t)  =  sup  inf  E{ J  e*Y(s_t)H(x+X  )ds 
^  T  i  s  - 1 


T<*  T  <0  0<T  (7<T  • 


where  sup  is  taken  over  all  stopping  times  ait  such  that  x  +  X  <  0  and  inf 

o-t 


is  taken  over  all  stopping  times  t  >  t  such  that  x  +  X  >0 

T-t 


Theorem  2.  The  optimal  stopping  game  described  above  has  value  that  is  the 

right  hand  side  of  (14)  does  not  change  if  sup  inf  is  replaced  by  i^f  sup.  The  value 

of  the  game  W  satisfies  (10)  -  (13)  and  it  relates  to  the  value  function  V  by 


W  =  DV. 


2.  Suppose  h  does  not  depend  on  t  and  we  consider  an  infinite  horizon  optimiza¬ 
tion  problem 

V (x)  =  sup  E { /  e"YSh(Z  )  ds 

B'L  0  5  (15) 


♦  J  re’15  dR  ♦  J  *e-ysdL  ) 

•  C  •  C 


where  Zg  is  given  by  (1)  with  t  =  0. 


r  V  1 


vS 


The  Hami lton- Jacobi  -  Be  1 Iman  equation  for  the  value  function  V  given  by 
(15)  reduces  to  an  ordinary  differential  equation  with  gradient  constrains 


0  =  min{rV(x)  -  yV(x)  ♦  h(x),  V'(x)  ♦  r, 

l  -  V (x) } 


(16) 


In  case  of  infinite  horizon  control,  we  can  loosen  the  assumption  on  h,  namely 
we  assume  that  h  is  a  nonnegative  convex  C*  function  and 

I  h  *  (x)  |  -*•  <*>  as  |  x  |  -*■  <*> .  (1?) 


Theorem  5.  Assume  that  (17)  holds  and  r,  t  >  0.  Then  there  exists  a  unique 

solution  V*fx)  to  (16).  There  exists  a  unique  optimal  policy  R* ,  L* .  If 
dV*  dV* 

a  =  inf  {x :  (x)  >  -r}  and  b  =  sup{x:jj  (x)  <  t)  then  for  all  t  >0 

a  s  Z*  <;  b 

where  7*  =  x  +  X  +  R*  -  L* .  Moreover 


0  fct  a 


dR*  *  / U* 
0  fct 


,  dL* 

t.:*  b  t 


n . 


The  above  theorem  shows  that  the  optimal  control  in  the  infinite  horizon  problem 
consists  of  keeping  the  controlled  process  Z*  inside  the  interval  Ca,b]  reflecting 
it  at  the  boundaries. 

We  want  to  establish  the  correspondence  between  optimal  control  problems 
and  game  of  optimal  stopping  of  two  persons.  For  simplicity,  we  assume  that  h 
attains  its  minimum  at  point  0. 

Consider  an  optimal  stopping  game  of  two  persons. 


ta3_ 

W(x)  =  sup  inf  E { / e  YXh(x+Xt)dt 


♦  £e‘YTl  -re  "  "l  } 

T<0  0<T 


*  YT. 


(18) 


4 


where  sup  is  taken  over  all  stopping  times  o  such  that  x  +  >  0  and  inf  is 

taken  over  all  stopping  times  x  such  that  x  +  X^<  0. 

Theorem  4.  The  quantity  in  the  right  hand  side  of  (18)  does  not  change  if 
sup  inf  is  changed  to  inf  sup.  The  value  of  the  game  W  given  by  (18)  is  equal 
to  the  derivative  of  V*  given  by  (16).  The  optimal  policies  o*  and  t*  in  (18) 
are  given  by 

a*  =  inf(t:  x+X^sb) 

t*  =  inf(t:  x+X^a) 

where  a  and  b  are  the  same  as  in  the  theorem  3. 

Similar  results  were  obtained  in  t-7}  for  the  problem  with  average  (per 
unit  of  time)  criterion. 
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